20 research outputs found

    The Design of Terra: Harnessing the Best Features of High-Level and Low-Level Languages

    Get PDF
    Applications are often written using a combination of high-level and low-level languages since it allows performance critical parts to be carefully optimized, while other parts can be written more productively. This approach is used in web development, game programming, and in build systems for applications themselves. However, most languages were not designed with interoperability in mind, resulting in glue code and duplicated features that add complexity. We propose a two-language system where both languages were designed to interoperate. Lua is used for our high-level language since it was originally designed with interoperability in mind. We create a new low-level language, Terra, that we designed to interoperate with Lua. It is embedded in Lua, and meta-programmed from it, but has a low level of abstraction suited for writing high-performance code. We discuss important design decisions - compartmentalized runtimes, glue-free interoperation, and meta-programming features - that enable Lua and Terra to be more powerful than the sum of their parts

    The unexplained nature of reading.

    Get PDF
    The effects of properties of words on their reading aloud response times (RTs) are 1 major source of evidence about the reading process. The precision with which such RTs could potentially be predicted by word properties is critical to evaluate our understanding of reading but is often underestimated due to contamination from individual differences. We estimated this precision without such contamination individually for 4 people who each read 2,820 words 50 times each. These estimates were compared to the precision achieved by a 31-variable regression model that outperforms current cognitive models on variance-explained criteria. Most (around 2/3) of the meaningful (non-first-phoneme, non-noise) word-level variance remained unexplained by this model. Considerable empirical and theoretical-computational effort has been expended on this area of psychology, but the high level of systematic variance remaining unexplained suggests doubts regarding contemporary accounts of the details of the mechanisms of reading at the level of the word. Future assessment of models can take advantage of the availability of our precise participant-level database

    Opt: A Domain Specific Language for Non-linear Least Squares Optimization in Graphics and Imaging

    Full text link
    Many graphics and vision problems can be expressed as non-linear least squares optimizations of objective functions over visual data, such as images and meshes. The mathematical descriptions of these functions are extremely concise, but their implementation in real code is tedious, especially when optimized for real-time performance on modern GPUs in interactive applications. In this work, we propose a new language, Opt (available under http://optlang.org), for writing these objective functions over image- or graph-structured unknowns concisely and at a high level. Our compiler automatically transforms these specifications into state-of-the-art GPU solvers based on Gauss-Newton or Levenberg-Marquardt methods. Opt can generate different variations of the solver, so users can easily explore tradeoffs in numerical precision, matrix-free methods, and solver approaches. In our results, we implement a variety of real-world graphics and vision applications. Their energy functions are expressible in tens of lines of code, and produce highly-optimized GPU solver implementations. These solver have performance competitive with the best published hand-tuned, application-specific GPU solvers, and orders of magnitude beyond a general-purpose auto-generated solver

    MAD Max Beyond Single-Node: Enabling Large Machine Learning Model Acceleration on Distributed Systems

    Full text link
    Training and deploying large machine learning (ML) models is time-consuming and requires significant distributed computing infrastructures. Based on real-world large model training on datacenter-scale infrastructures, we show 14~32% of all GPU hours are spent on communication with no overlapping computation. To minimize the outstanding communication latency, in this work, we develop an agile performance modeling framework to guide parallelization and hardware-software co-design strategies. Using the suite of real-world large ML models on state-of-the-art GPU training hardware, we demonstrate 2.24x and 5.27x throughput improvement potential for pre-training and inference scenarios, respectively

    Just-in-time Length Specialization of Dynamic Vector Code

    No full text
    Abstract Dynamically typed vector languages are popular in data analytics and statistical computing. In these languages, vectors have both dynamic type and dynamic length, making static generation of efficient machine code difficult. In this paper, we describe a tracebased just-in-time compilation strategy that performs partial length specialization of dynamically typed vector code. This selective specialization is designed to avoid excessive compilation overhead while still enabling the generation of efficient machine code through length-based optimizations such as vector fusion, vector copy elimination, and the use of hardware SIMD units. We have implemented our approach in a virtual machine for a subset of R, a vector-based statistical computing language. In a variety of workloads, containing both scalar and vector code, we show near autovectorized C performance over a large range of vector sizes
    corecore